33 research outputs found

    Algebraic tools in phylogenomics.

    Get PDF
    En aquesta tesi interdisciplinar desenvolupem eines algebraiques per a problemes en filogenètica i genòmica. Per estudiar l'evolució molecular de les espècies sovint s'usen models evolutius estocàstics. L'evolució es representa en un arbre (anomenat filogenètic) on les espècies actuals corresponen a fulles de l'arbre i els nodes interiors corresponen a ancestres comuns a elles. La longitud d'una branca de l'arbre representa la quantitat de mutacions que han ocorregut entre les dues espècies adjacents a la branca. Llavors l'evolució de seqüències d'ADN en aquestes espècies es modelitza amb un procés Markov ocult al llarg de l'arbre. Si el procés de Markov se suposa a temps continu, normalment s'assumeix que també és homogeni i, en tal cas, els paràmetres del model són les entrades d'una raó de mutació instantània i les longituds de les branques. Si el procés de Markov és a temps discret, llavors els paràmetres del model són les probabilitats condicionades de substitució de nucleòtids al llarg de l'arbre i no hi ha cap hipòtesi d'homogeneïtat. Aquests últims són els tipus de models que considerem en aquesta tesi i són, per tant, més generals que els de temps continu. Des d'aquesta perspectiva s'estudien els problemes més bàsics de la filogenètica: donat un conjunt de seqüències d'ADN, com decidim quin és el model evolutiu més adequat? com inferim de forma eficient els paràmetres del model? I fins i tot, tal i com també hem provat en aquesta tesi, és possible que les espècies no hagin evolucionat seguint un sol arbre sinó una mescla d'arbres i llavors cal abordar aquestes preguntes en aquest cas més general. Per a models evolutius a temps continu i homogenis, s'ha proposat solucions diverses a aquestes preguntes al llarg de les últimes dècades. En aquesta tesi resolem aquests dos problemes per a models evolutius a temps discret usant tècniques algebraiques provinents d'àlgebra lineal, teoria de grups, geometria algebraica i estadística algebraica. A més a més, la nostra solució per al primer problema és vàlida també per a mescles filogenètiques. Hem fet tests dels mètodes proposats en aquesta tesi sobre dades simulades i dades reals del projectes ENCODE (Encyclopedia Of DNA Elements). Per tal de provar els nostres mètodes hem donat algoritmes per a generar seqüències evolucionant sota un model a temps discret amb un nombre esperat de mutacions prefixat. I així mateix, hem demostrat que aquests algorismes generen totes les seqüències possibles (per la majoria de models). Els tests sobre dades simulades mostren que els mètodes proposats són molt acurats i els resultats sobre dades reals permeten corroborar hipòtesis prèviament formulades. Tots els mètodes proposats en aquesta tesi han estat implementats per a un nombre arbitrari d'espècies i estan disponibles públicament.In this thesis we develop interdisciplinary algebraic tools for genomic and phylogenetic problems. To study the molecular evolution of species one often uses stochastic evolutionary models. The evolution is represented in a tree (called phylogenetic tree) whose leaves represent current species and whose internal nodes correspond to their common ancestors. The length of a branch of the tree represents the number of mutations that have occurred between the two species adjacent to the branch. Then ,the evolution of DNA sequences in these species is modeled with a hidden Markov process along the tree. If the Markov process is assumed to be continuous in time, it is usually assumed homogeneous as well and, if so, the model parameters are the instantaneous rate of mutation and the lengths of the branches. If the Markov process is discrete in time, then the model parameters are the conditional probabilities of nucleotide substitution along the tree and there is no assumption of homogeneity. The latter are the types of models we consider in this thesis and are therefore more general than the homogeneous continuous ones. From this perspective we study the basic problems of phylogenetics: Given a set of DNA sequences, what is the evolutionary model that best fits the data? how can we efficiently infer the model parameters? Also, as we also checked in this thesis, it is possible that species have not evolved along a single tree but a mixture of trees so that we need to address these questions in this more general case. For continuous-time, homogeneous, evolutionary models, several solutions to these questions have been proposed during the last decades. In this thesis we solve these two problems for discrete-time evolutionary models, using algebraic techniques from linear algebra, group theory, algebraic geometry and algebraic statistics. In addition, our solution to the first problem is also valid for phylogenetic mixtures. We have made tests of the methods proposed in this thesis on simulated and real data from ENCODE Project (Encyclopedia Of DNA Elements). To test our methods, we also provide algorithms to generate sequences evolving under discrete-time models with a given expected number of mutations. Even more, we have proved that these algorithms generate all possible sequences (for most models). Tests on simulated data show that the methods are very accurate and our results on real data confirm hypotheses previously formulated. All the methods in this thesis have been implemented for an arbitrary number of species and are publicly available.Postprint (published version

    The space of phylogenetic mixtures for equivariant models

    Get PDF
    The selection of the most suitable evolutionary model to analyze the given molecular data is usually left to biologist's choice. In his famous book, J Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained open the question regarding whether these equations were enough for characterizing the evolutionary model. Here we prove that, for equivariant models of evolution, the space of distributions satisfying these linear equations coincides with the space of distributions arising from mixtures of trees on a set of taxa. In other words, we prove that an alignment is produced from a mixture of phylogenetic trees under an equivariant evolutionary model if and only if its distribution of column patterns satisfies the linear equations mentioned above. Moreover, for each equivariant model and for any number of taxa, we provide a set of linearly independent equations defining this space of phylogenetic mixtures. This is a powerful tool that has already been successfully used in model selection. We also use the results obtained to study identifiability issues for phylogenetic mixtures.Comment: 28 pages, 1 figure; to appear in Algorithms for Molecular Biolog

    The C-terminal domain of the Escherichia coli RNA polymerase α subunit plays a role in the CI-dependent activation of the bacteriophage λ pM promoter

    Get PDF
    The bacteriophage λ pM promoter is required for maintenance of the λ prophage in Escherichia coli, as it facilitates transcription of the cI gene, encoding the λ repressor (CI). CI levels are maintained through a transcriptional feedback mechanism whereby CI can serve as an activator or a repressor of pM. CI activates pM through cooperative binding to the OR1 and OR2 sites within the OR operator, with the OR2-bound CI dimer making contact with domain 4 of the RNA polymerase σ subunit (σ4). Here we demonstrate that the 261 and 287 determinants of the C-terminal domain of the RNA polymerase α subunit (αCTD), as well as the DNA-binding determinant, are important for CI-dependent activation of pM. We also show that the location of αCTD at the pM promoter changes in the presence of CI. Thus, in the absence of CI, one αCTD is located on the DNA at position −44 relative to the transcription start site, whereas in the presence of CI, αCTD is located at position −54, between the CI-binding sites at OR1 and OR2. These results suggest that contacts between CI and both αCTD and σ are required for efficient CI-dependent activation of pM

    Infection of Semen-Producing Organs by SIV during the Acute and Chronic Stages of the Disease

    Get PDF
    International audienceBACKGROUND: Although indirect evidence suggests the male genital tract as a possible source of persistent HIV shedding in semen during antiretroviral therapy, this phenomenon is poorly understood due to the difficulty of sampling semen-producing organs in HIV+ asymptomatic individuals. METHODOLOGY/PRINCIPAL FINDINGS: Using a range of molecular and cell biological techniques, this study investigates SIV infection within reproductive organs of macaques during the acute and chronic stages of the disease. We demonstrate for the first time the presence of SIV in the testes, epididymides, prostate and seminal vesicles as early as 14 days post-inoculation. This infection persists throughout the chronic stage and positively correlates with blood viremia. The prostate and seminal vesicles appear to be the most efficiently infected reproductive organs, followed by the epididymides and testes. Within the male genital tract, mostly T lymphocytes and a small number of germ cells harbour SIV antigens and RNA. In contrast to the other organs studied, the testis does not display an immune response to the infection. Testosteronemia is transiently increased during the early phase of the infection but spermatogenesis remains unaffected. CONCLUSIONS/SIGNIFICANCE: The present study reveals that SIV infection of the macaque male genital tract is an early event and that semen-producing organs display differential infection levels and immune responses. These results help elucidate the origin of HIV in semen and constitute an essential base to improving the design of antiretroviral therapies to eradicate virus from semen

    Host hindrance to HIV-1 replication in monocytes and macrophages

    Get PDF
    Monocytes and macrophages are targets of HIV-1 infection and play critical roles in multiple aspects of viral pathogenesis. HIV-1 can replicate in blood monocytes, although only a minor proportion of circulating monocytes harbor viral DNA. Resident macrophages in tissues can be infected and function as viral reservoirs. However, their susceptibility to infection, and their capacity to actively replicate the virus, varies greatly depending on the tissue localization and cytokine environment. The susceptibility of monocytes to HIV-1 infection in vitro depends on their differentiation status. Monocytes are refractory to infection and become permissive upon differentiation into macrophages. In addition, the capacity of monocyte-derived macrophages to sustain viral replication varies between individuals. Host determinants regulate HIV-1 replication in monocytes and macrophages, limiting several steps of the viral life-cycle, from viral entry to virus release. Some host factors responsible for HIV-1 restriction are shared with T lymphocytes, but several anti-viral mechanisms are specific to either monocytes or macrophages. Whilst a number of these mechanisms have been identified in monocytes or in monocyte-derived macrophages in vitro, some of them have also been implicated in the regulation of HIV-1 infection in vivo, in particular in the brain and the lung where macrophages are the main cell type infected by HIV-1. This review focuses on cellular factors that have been reported to interfere with HIV-1 infection in monocytes and macrophages, and examines the evidences supporting their role in vivo, highlighting unique aspects of HIV-1 restriction in these two cell types

    Bayesian network studies for splicing regulatory elements

    No full text

    A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion

    No full text
    We propose a heuristic approach to the detection of evidence for recombination and gene conversion in multiple DNA sequence alignments. The proposed method consists of two stages. In the first stage, a sliding window is moved along the DNA sequence alignment, and phylogenetic trees are sampled from the conditional posterior distribution with MCMC. To reduce the noise intrinsic to inference from the limited amount of data available in the typically short sliding window, a clustering algorithm based on the Robinson-Foulds distance is applied to the trees thus sampled, and the posterior distribution over tree clusters is obtained for each window position. While changes in this posterior distribution are indicative of recombination or gene conversion events, it is difficult to decide when such a change is statistically significant. This problem is addressed in the second stage of the proposed algorithm, where the distributions obtained in the first stage are post-processed with a Bayesian hidden Markov model (HMM). The emission states of the HMM are associated with posterior distributions over phylogenetic tree topology clusters. The hidden states of the HMM indicate putative recombinant segments. Inference is done in a Bayesian sense, sampling parameters from the posterior distribution with MCMC. Of particular interest is the determination of the number of hidden states as an indication of the number of putative recombinant regions. To this end, we apply reversible jump MCMC, and sample the number of hidden states from the respective posterior distribution.
    corecore